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We place ourselves in the setting of high-dimensional statistical 
inference, where the number of variables p in a data set of interest 
is of the same order of magnitude as the number of observations 
n. More formally, we study the asymptotic properties of correlation 
and covariance matrices, in the setting where p/n — > p G (0, oo), for 
general population covariance. 

We show that, for a large class of models studied in random matrix 
theory, spectral properties of large-dimensional correlation matrices 
are similar to those of large-dimensional covarance matrices. 

We also derive a Marcenko-Pastur-type system of equations for 
the limiting spectral distribution of covariance matrices computed 
from data with elliptical distributions and generalizations of this fam- 
ily. The motivation for this study comes partly from the possible rele- 
vance of such distributional assumptions to problems in econometrics 
and portfolio optimization, as well as robustness questions for certain 
classical random matrix results. 

A mathematical theme of the paper is the important use we make 
of concentration inequalities. 

1. Introduction. It is increasingly common in multivariate statistics and 
various areas of applied mathematics and computer science to have to work 
with data sets where the number of variables, p, is of the same order of 
magnitude as the number of observations, n. When studying asymptotic 
properties of estimators in this setting, usually under the assumption that 
p/n has a finite nonzero limit, we often obtain convergence results that differ 
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from those obtained under the "classical" assumptions that p is fixed and n 
goes to infinity. 

A good example is provided by problems of portfolio optimization in quan- 
titative finance. Typically, if one is working with, say, the stocks making up 
the S&P 500 index over a period of one year and recording quantities daily, 
one will have to work with a data matrix of size roughly 250 x 500. For 
many "large" portfolio optimization problems, the data matrices will have 
the characteristic that p/n is not very small. Since covariance matrices (and 
their inverses) play a key role in solving a number of portfolio optimization 
problems and, in particular, Markowitz's formulation (see, e.g., [13]), it is 
important to understand how well our estimators perform in this "new" type 
of asymptotics. This should help us to assess the quality of our empirical 
choices of portfolio and their proximity to the theoretically optimal ones. 
We refer the reader who is particularly interested in these questions to [31] 
for an early and interesting application of random matrix ideas to portfolio 
optimization problems. 

The realization of the fact that there might be problems in estimating the 
spectral properties of large-dimensional covariance matrices when p/n is not 
small is not recent: the first paper in the area is probably [35], where the 
authors studied the behavior of the eigenvalues of large-dimensional sample 
covariance matrices for diagonal population covariance matrices and with 
some assumptions on the structure of the data. The surprising result they 
found was that, in the case of i.i.d. data with variance 1, the eigenvalues 
of the sample covariance matrix X*X/n do not concentrate around 1 (the 
value of all population eigenvalues), but rather were spread out on the in- 
terval [(1 — \/p/n)^, (1 -|- \/plnf'\ when p<n. Moreover, their empirical dis- 
tribution is asymptotically nonrandom. We note that this seminal paper is 
much richer than just described and refer the reader to it for more details. A 
simple lesson to be taken from this is that when p/n is not small, the sample 
covariance matrix is not a good estimator of the population covariance. 

Since this result, there has been a flurry of activity, especially in recent 
years, concerning the behavior of the largest eigenvalue of sample covariance 
matrices [21, 50], their fluctuation behavior in the null case [14, 19, 28, 29] 
and under alternatives [6, 16, 40], as well as fluctuation results for linear 
spectral statistics of those matrices [1, 5, 30]. Even more recently, some 
of these results have started to be used to develop better estimators of 
these large-dimensional covariance matrices ([11, 17] and [42]). We also note 
that from a statistical point of view, other approaches to estimation using 
regularization have been taken, with sometime striking results [8, 32]. 

As noted above, the random matrix results in question concern, somewhat 
exclusively, sample covariance matrices. However, in practice, sample corre- 
lation matrices are often used, for instance for principal component analysis 
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(PC A). A question we were asked several times by practitioners is the ex- 
tent to which the random matrix results would hold if one were concerned 
with correlation matrices and not covariance matrices. Part of the answer 
is already known from a paper due to [27], where he considered the case of 
i.i.d. data. The answer was that spectral distribution results, as well as a.s. 
convergence of extreme eigenvalue results, held in this situation. However, 
in practice, the assumption of i.i.d. data is not very reasonable and, in most 
cases, practitioners would actually hope to be in the presence of an inter- 
esting covariance structure, away from the no-information case represented 
by the identity covariance matrix. In this paper, we tackle the case where 
the population covariance is not Idp and show that classic random matrix 
results hold then too, with the population covariance matrix replaced by the 
population correlation matrix. This means that recently developed methods 
that make use of random matrix theory to better estimate the eigenvalues 
of population covariance matrices can also be used to estimate the spectrum 
of population correlation matrices. 

As explained below, such results can be shown for Gaussian and some 
non-Gaussian data. Therefore, a natural question is to wonder how robust 
to these distributional assumptions the results are. In particular, a recent pa- 
per [20] and a recent monograph [37] make an interesting case for modeling 
financial data through elliptical distributions. As explained in [20] and [37], 
this has to do with certain tail-dependence properties that are absent from 
Gaussian data and present in a certain class of elliptically distributed data. 
As mentioned above, understanding the spectral properties of sample co- 
variance matrices with these distributions should help us better understand 
the properties of empirical solutions to the classical Markowitz portfolio 
optimization problem. This is one of the many applications these results 
could have. Though this paper does not deal with this specific problem, 
in the second part of the paper, we show that for elliptically distributed 
data (and generalizations), the spectrum of the sample covariance matrix 
is asymptotically nonrandom and we characterize the limit through the use 
of Stieltjes transforms. In particular, the result shows that the Marcenko- 
Pastur equation is not robust to deviation from the "Gaussian+" model 
usually considered in random matrix theory (see [44] and Theorem 1 below 
for an example of those assumptions). The result also explains some of the 
numerical results obtained by [20]. From a more theoretical standpoint, our 
approach allows us to break away from models for which the data vectors are 
linear transformations of random vectors with independent entries. Rather, 
what we need are concentration properties for 1-Lipschitz (with respect to 
the Euclidean norm) functionals of these data vectors. We note that some 
of our results can be obtained when the concentration properties are limited 
to convex 1-Lipschitz functions. Hence, our approach will show that some 
classical results in random matrix theory hold in wider generality than was 
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previously known. For instance, it shows that classical random matrix re- 
sults apply to data drawn, for instance, from a Gaussian copula (see [38]), 
under some restrictions on the operator norm of the corresponding correla- 
tion matrix. We note that the Gaussian copula problem appears to be quite 
far from what can be obtained using currently available results. 

As it turns out, central to the proofs to be presented are the concentration 
properties of certain quadratic forms. Below, we make use of a number of 
concentration inequalities, recent and less recent. The usefulness of these 
inequalities in random matrix theory has already been illustrated in [26], in 
a different context from that which we develop below. A very good reference 
on the topic of concentration is [33]. 

The fact that we rely on concentration of quadratic forms for many of 
these results also yields some practical insights about possible limitations 
of the models considered in the random matrix literature. In particular, 
applying the concentration results to the standard random matrix models 
considered in, for example. Theorem 1 (or their generalizations in Theorem 
2, with Aj = 1 in the notation of this latter theorem) shows that the cor- 
responding data vectors have norms (when divided by the square root of 
the dimension) that are all almost equal. Similarly, one can show (see, e.g., 
[15] for more details) that the concentration results we need also imply that 
for standard models, the data vectors are almost orthogonal to one another. 
More precisely, one can show that the maximum angle between data vectors 
goes to zero almost surely. Before applying or using these random matrix re- 
sults and, in particular, the Marcenko-Pastur equation, it therefore appears 
that practitioners should pay attention to these features of the data by, for 
instance, drawing histograms of the angles between data vectors and norms 
of the data vector divided by the square root of the dimension. If those are 
not "concentrated," this might call into question the quality of the fit of 
the standard models and the relevance of the insights drawn from random 
matrix results. 

The paper is organized as follows. In Section 2, we study the problem of 
spectral characteristics of correlation matrices with data drawn from stan- 
dard random matrix models. In Section 3, we characterize the limiting spec- 
trum of covariance matrices computed from data drawn from a generaliza- 
tion of elliptical distributions. In terms of concentration results, Section 2 
can be viewed as using concentration results for the norm of the columns of 
the data matrices of interest. On the other hand. Section 3 relies on concen- 
tration properties for the rows of the data matrices of interest. 

2. On large dimensional correlation matrices. We recall that a correla- 
tion matrix is a matrix that contains the correlations between the entries of 
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a vector. So, if R is the correlation matrix of the vector v, 

_ COV {Vi,Vj) 

— I 

We will naturally focus on empirical correlation matrices. We assume 
that we are given a sample of data {yk}^=ii where G IR^- Let us call 
f. j = l/?^X]fc=l 2/A;,i t^^6 mean of the ith component of our data vectors. We 
call Si the standard estimate of standard deviation of {yk/L}k=n that is, 

1 " 

i=l 

By definition, i?, the empirical correlation matrix of the data, is 

g _ - i)YJk=i{yk,i - y-,i){yk,j - y-j) 

SiSj 

These matrices play an important role in many multivariate statistical 
methods. In particular, in techniques like principal component analysis, 
there are sometimes debates as to whether one should use the correlation 
matrix of the data or their covariance matrix. It is therefore important for 
practitioners to have information about the behavior of correlation matrices 
in high dimensions. 

We now turn to our study of sample correlation matrices. The main result 
is Theorem 1, which states that under the model considered there (related to 
the classical one in random matrix theory), results concerning the spectral 
distribution and the largest eigenvalue carry over, without much modifica- 
tion, from sample covariance matrices to sample correlation matrices. 

Before we proceed, we need to establish some notation. In the remain- 
der of the paper, C+ = {z G C:Im[z] > 0}. We call v' the transpose of the 
vector V and use the same notation for matrices. We use |||M|||2 to denote 
the operator norm of a matrix M, that is, its largest singular value. For a 
positive semidefinite matrices, it is obviously its largest eigenvalue. If Y is 
an n X p matrix, we naturally denote by Yij its entry and call Y the 
matrix whose jth column is constant and equal to Y.j. Finally, the sample 
covariance matrix of the data stored in matrix Y is 

Sp=^iY-Yny-y)- 

2.1. A simple lemma. The crux of our argument is going to be that cor- 
relation matrices can be represented as the products of certain matrices, one 
of them being a covariance matrix. Hence, we will have solved our problem 
if we can show that these other matrices are "not too far" from matrices 
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we understand well and if we can show that "nothing" (as far as spectral 
properties are concerned) is lost when replacing them by these better un- 
derstood matrices. Before we state our main theorem and prove it, we state 
two results of independent interest, on which we will rely in the proof. 

Lemma 1. Suppose that Mp is a px p Hermitian random matrix whose 
spectral characteristics [spectral distribution Fp or largest eigenvalue Ai(Mp)/ 
converge a.s. to a limit and whose spectral norm is (a.s.) bounded as p^ oo. 
Suppose that Dp is a p x p diagonal matrix and that \\\Dp — Idp III2 — > a.s. 
Then the spectral characteristics of DpMpDp and Dp^MpDp^ have the same 
limits as those of Mp. 

Proof. The assumption HlDp — Idp |||2 —> implies that for p large enough. 
Dp is invertible. Now, 

|||Mp - DpMpDp\\\2 = \\\Mp - MpDp + MpDp - DpMpDp\\\2 

< |||Mp|||2|||Dp - Idp III2 + IIII^p - Idp |||2|Pp|||2|||Mp|||2 
— > a.s. 

Using Weyl's inequality (see [7], Corollary III. 2. 6), that is, the fact that for 
Hermitian matrices A and B, and any i, if Xi{A) denotes the ith eigenvalue 
of A, ordered in decreasing order, |Aj(^) — Xi{B)\ < \\\A — B\\\2, we conclude 
that 

max \XkiMp) - Xk{DpMpDp)\ ^ a.s. 

k=l,...,p 

Because |||Mp|||2 is bounded a.s., the two sequences are a.s. asymptotically 
distributed (see [25], page 62, or [24]). Therefore, if Fp(Mp) converges weakly 
to F, then Fp{DpMpDp) also converges to F. 

If \\\Dp — Idp III2 — > 0, then |||-Dp ^ — Idp |||2 too. So, the same results 
hold when we replace Dp by Dp^. □ 

The previous lemma is helpful in our context thanks to the following 
elementary fact, which is standard in multivariate statistics. 

Fact 1 (Correlation matrix as function of covariance matrix). Let Cp 
denote the correlation matrix of our data and Sp the covariance matrix of 
the data. Let Dp[Sp) denote the diagonal matrix consisting of the diagonal 
of Sp. We then have 

Cp = [Dp{Sp)r^/^Sp[Dp{Sp)]-^'\ 
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Proof. This is just a simple consequence of the fact that if D is a 
diagonal matrix, then 

(DHD)ij = di^iHijdjj. 
Note that Cp{i,j) = Sp{i,j)/^ Sp{i, i)Sp{j,j) and the assertion follows. □ 

As a consequence of the previous lemma and fact, we will deduce the 
asymptotic spectral properties of correlation matrices from those of covari- 
ance matrices by simply showing convergence of the diagonal of Sp (or a 
scaled version of it) to Idp in operator norm. 

2.2. Spectra of large-dimensional correlation matrices. We are now ready 
to state the main theorem of this section. 



Theorem 1. Suppose that X is an nx p matrix of i.i.d. random vari- 
ables with variance 1. Assume, without loss of generality, that their com- 
mon mean is 0. Denote by Xij the {i,j)th entry of X . Assume, further, 
that E(|Xjj |^(log(|Xjj|))^"''^^) < cxo. Suppose that Sp is a p x p covariance 
matrix and let Tp denote the corresponding correlation matrix. Assume that 
|||rp|||2 < K for all p. 

Let 

1/2 

• Y = XTip (Y is the observed data matrix — the n observed data vectors 
are stored in the rows of Y ); 

• yl = x^y^ 

The spectral properties of corriY), the sample correlation matrix of the 

1/2 ~ 

data, are then the same as the spectral properties of Tp {X — X)'{X — 

x)vy^/{n - 1) = (Fi - yi)'(yi - - 1). 

In particular, the Stieltjes transform of the limiting spectral distribution of 
corr(y) satisfies the Marcenko-Pastur equation, with parameter the spectral 
distribution ofTp. Namely, if Hp, the spectral distribution ofTp, has a.s. a 
limit H, if p/n has a finite limit p and if is the Stieltjes transform of 
corr(y), we have, letting Wn = —{^ — p/n) / z + {p/n)mn{z) , 

/N.N , . , . ^ 1 r \dH(X) 

Wn^z) ^ w[z) a.s., which satisfies — = z — p 



w{z) J l-\-Xw{z)^ 

and w is the unique function mapping into C"*" to satisfy this equation. 

Also, if the norm of Fp^'^^X — Xy[X — X)rp^^/(n — 1) has a limit in 
which Tp intervenes only through its eigenvalues, the norm of corr(y) has 
the same limit. 
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This theorem is related to that of [27], which was concerned with Tp = Idp, 
which would amount to doing multivariate analysis with i.i.d. variables, 
an assumption that, for obvious statistical reasons, practitioners are not 
willing to make. Here, by contrast, we are able to handle general covariance 
structures, assuming that the spectral norm of Tp is bounded. However, [27] 
required only four moments and we require a little more. We explain in 
Section 2.3.2 why this is the case. 

We note that the proof can actually handle cases where |||rp|||2 grows 
slowly with p. We refer the reader to [44] for more information on the 
Marcenko-Pastur equation. We note that the paper [44] is an important 
strengthening of the result of [35], dealing, in particular, with nondiagonal 
covariance matrices. 

Recent progress has led to fairly explicit characterization of the norm 
of large-dimensional sample covariance matrices, a fact that makes these 
results potentially useful in, among other fields, statistics. In particular, re- 
sults concerning limiting spectral distributions do not, in general, provide 
any information about the localization of the largest eigenvalue of the corre- 
sponding matrices. For many practitioners and, in particular, those dealing 
with principal component analysis (see, e.g., [36]), it is important to have this 
localization information. Our analysis, combined with recent results, allows 
us to characterize the limit of the largest eigenvalue of sample correlation 
matrices in certain cases. 

In particular, the following consequence for the norm of the correlation 
matrix can be drawn from the recent article [16], specifically Fact 2 there 
(which is partly a consequence of a deep result in [4] ) . 

Corollary 1. Under the assumptions of Theorem 1, if \i(Tp) tends to 
the endpoint of the support of H and the model {Tp,n,p} is in the class Q 
defined in [16], then 

|||corr(y)|||2 -/in,p^O a.s., 

where 

^""^ ^ dH{\), coG [0,1/Ai(rp)). 



p 7 V 1 - Acq 

Although the result might seem somewhat cryptic (the conditions set 
forth in [16] are rather complicated to describe), it basically says that if the 
largest eigenvalues of Tp are sufficiently close to one another (the precise 
mathematical meaning of this statement is contained in the assumptions 
made in [16]), then the largest eigenvalue of Cp will converge to the endpoint 
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of the limiting spectral distribution of corr(y). We insist on the fact that 
the quantities above (cq and //n,p) are fairly easy to compute explicitly with 
a computer, making them relevant in practice. We refer the reader to [16] 
for more information about these problems, as well as examples of matrices 
Tp for which the results hold. 

2.3. Proof of Theorem 1. The proof is in three steps. The first involves 
showing that to understand the spectral properties of corr(y), we need to 
focus only on the matrix y/li [or {Y\ — Y\)'{Yi — Yi)]. We then need a 
truncation and centralization step for the entries of X. Finally, we use a 
concentration-of-measure results to show that the diagonal of the corre- 
sponding covariance matrix indeed converges in operator norm to the iden- 
tity. We postpone a formal proof of Theorem 1 to Section 2.3.5, where we 
put all of the elements together. 

2.3.1. Replacing Tip by Tp. Since the correlation coefficient is invariant 
under shifting and (positive) scaling of random variables, we see that for 
any diagonal matrix D with positive entries, 

corr(y) = corr{YD) 

since {YD)ij = Yijdjj. In particular, for D, we can use (diag(Sp))~-'^/^, 
which clearly has positive entries. After this adjustment, the data matrix 
we will focus on, Y2, takes the form 

Y2 = XG, where G = Sy2(diag(Sp))-i/2 ^ 5.1/2^^ 

and G'G = Tp. Note, in particular, that since Tp is a correlation matrix, its 
diagonal consists of I's. Because G is not symmetric, it is not, in general, 
equal to a/T^. We thus need to explain why we will be able to rely on existing 
random matrix results since, for example, [44] requires the data to have the 

form XTip^"^ . 

Since G is similar to all of its eigenvalues are real and 

nonnegative. Further, because G'G = Tp, the eigenvalues of G are equal to 

1/2 

the square root of the eigenvalues of Tp. Because Sp and D are invertible, 

1/2 

so is Tjp D. Therefore, the spectrum of the matrix of interest, Y2Y2/n, is 

the same as the spectrum of X' XY^J'^ D'^Y}p'^ jn. Even though, in general, 

Y^J'^ D'^Y^p'^ 7^ Fp, these matrices have the same eigenvalues. Because the 
Marcenko-Pastur equation involves only the eigenvalues of the deterministic 
matrix in question, the limiting spectral distribution of ^2/""- is the same 

1 /2 1/2 

as the limiting spectral distribution of Fp X' XTp /n = Y(Yi/n. A similar 
conclusion applies to the largest eigenvalue if it depends only of Fp through 
Fp's spectrum. 

So, in what follows, we only need to investigate corr(Y2) or corr(Yi) to 
understand corr(y). 
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2.3.2. Truncation and centralization step. In this subsubsection, we show 
that we can truncate the entries of X, the nx p matrix full of i.i.d. random 
variables, at level y^/(logn)(^"'''^)/^ = y/njbn and almost surely not change 
the value of corr(y), at least for p large enough. The same holds when the 
truncated values are then re-centered. The conclusion of this subsubsection 
is that it is enough to study matrices X whose entries are i.i.d. with mean 
and bounded in absolute value by C-^/n/ (log n)^"'^^^-'/^. 

The proof is similar to the argument given for the proof of Lemma 2.2 
in [50]. However, because the term l/(logn)(^"'"^)/^ is crucial in our later 
arguments and the authors of [50] gloss over the details of their choice of 5„ , 
we feel a full argument is needed to give a convincing proof, although we 
do not claim that the arguments are new. This is where we need a slightly 
stronger assumption that just the finite fourth moment assumption made in 
[50]. (Our problem is with Remark 1 in [50], which is not clearly justified. 
There also appears to be counterexamples to this claim. However, it does 
not seem that (the full strength of) this remark is ever really used in that 
paper and the rest of the arguments are clear.) We have the following lemma, 
which closely follows Lemma 2.2 in [50]. 

Lemma 2 (Truncation). Let X he an infinite double array of identically 
distributed (i.d.) random variables. Suppose that Xn is an n x p matrix of 
identically distributed random variables, with mean 0, variance 1 and whose 
entries, Xij, satisfy E(|Xjj|^(log(|Xjj|))^"''^^) < oo. Xn corresponds to the 
upper-left corner of X. Suppose that p/n has a finite limit p. Let Tn denote 
the matrix with {i,j)th entry -'^ijT|Xi.j|<v^/(iogn)(i+=)/2 ■ Then, 

P(X„/T„ ^.o.)=0. 

Proof. Because of the moment assumption made on Xij, we have, if 
we let /e(x) = x^(logx)^(^+^\ 

/ f'MP{\X^,j\ >y)dy=Y, / f'MP{\Xi,,\ >y)dy<^ 

for any increasing sequence {tim}m=0' with = and Um — > oo as m ^ oo. 
Now, when y is large enough, /^(y) > 0, so 

/ f',{y)n\Xi,j\ >y)dy> P{\Xi,,\ > Um+l){fe{Um+l) " fe{nm))- 

Let 7m = 2"^ and Um = \/lml (logTm)"^"^^- Note that Um is increasing 
for m sufficiently large. Elementary computations show that as m tends 
to oo_, n^(lognm)2+2^ ~ (2+2^). Consequently, /^(nm+i) - feium) ~ 3 x 
22(m-i)^ Note that our moment requirements therefore imply that 

00 

^22™P(|Xi,,-|>Um)<oo. 

m=l 
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Now, for n satisfying 7m-i < n< 7^, we threshold Xn{i,j) at level Um-i- 
(In what follows, 2p^rn should be replaced by the smallest integer greater 
than this number, but to avoid cumbersome notation, we do not stress this 
particular fact.) We have 

00 / n p \ 

P(X„^r„i.o.)< E P U U U(|X„(i,i)|>n^„i) 

m=k \7m-l<n<7m fi=li=l / 

00 / 7m 2p7m \ 

< U U U i\Xn{i,j)\>n^M 

m=k \7m_i<n<7„i i=l j=l / 
00 /7m 2p7 

= E^ U [ji\Xn{^,J)\>u.^M 

m=k \i=l j=l / 

00 

<2pY,jlPi\Xi,j\>Um-l) 
m=k 

00 

= 8p^22(™-i)p(|X,,,-|>u™_i). 

m=k 

The right-hand side tends to when k tends to infinity and the left-hand 
side is independent of k. We conclude that 

i.o.) = 0. □ 

Lemma 3 (Centralization). Let TCn denote the matrix with entries TCn{i, 
j) = Tn{i,j) - ETn{i,j). Then, 

-|||T^T„ - TC^rC„|||2 ^ a.s. 
n 

Proof. The proof would be a simple repetition of the arguments in 
the proof of Lemma 2.3 in [50], with r = 1/2 and 5 = (logn)^(^+^)/^ in the 
notation of their papers, so we omit it. Note that that proof finds a bound 
on the spectral norm of T!^Tn — TC'^TCn. □ 

The centralization lemma (Lemma 3) guarantees that the spectral charac- 
teristics of G'TC'^TCnG/n are asymptotically the same as those of G'T^TnG/n: 
this is a consequence of the fact that |||G|||| = Ai(rp) is uniformly bounded, 
as well as of Weyl's inequality, 

max\Xi{G'TG^,TCnG/n) - \i{G' T'^TnG) / n\ 

i 

<|||G'(rc;TC„-TX)G/n|||2 
<\\\G'G\h\\\{TC'^TCn-T'M/n\h. 
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Therefore, the spectral characteristics of G'TC'^TCnG/n and those of 
r^/^ X X'^XnT^^"^ /n = Y2Y2/n are also asymptotically the same, by the trun- 
cation lemma, since a.s. T^^^Xl^XnT^/'^ /n = r^/'^T^TnT^/^ /n. 

2.3.3. Controlling the diagonal in operator norm. Now that we have seen 
that a.s. we can replace Xn by TC„ without incurring any loss in operator 
norm, we will essentially focus in the analysis on the matrices obtained by 
replacing X„ by TCn since the results will be a.s. the same. Recall that the 
entries of TC„ are bounded by C ^/n/ (logn)^^'^^^^'^ . 

We turn our attention to showing that the diagonal of G'X'XG/n is 
close to 1. We remind the reader that G = S^/'^(diag(Sp))~^/^ and as- 
sume, without loss of generality, that C < 2. As a matter of fact, since 
E(r„(i,i)) ^E(X„(i,i)) =0, \TCn{i,j)\ < \Xn{iJ)\ + 1 forn large enough 
and since j)| < ^/(logn)(i+^)/2, |rC„(i, j)| < 2^/(logn)(i+^)/2 for 

n large enough. This shows that we can take G <2 without loss of generality. 

Lemma 4. Let us focus on Sp = ^i^2^2 = ^G' X' XG , a quantity often 
studied in random matrix theory. 
When p>in, we have 

max \\/ Sp{i,i) — 1\ ^ a.s. 
i=l,...,p » 

Proof. First, let Wp = G'TG'nTC'n/n. We note that, according to re- 
sults in the previous subsubsection, 

\Sp{i,i)-Wp{i,i)\ = \e',{Sp -Wp)e^\<\\\Sp-Wp\\\2^0 a.s. 

Hence, the result will be shown if we can show it for Wp{i,i). 

We let Vi denote the ith column of G. Letting M = X'X/n, we note that 

Sp{i, i) = v[Mvi = \\Xvi/y/n\\^2- 

Now, consider the function fi from M'"^ to M defined by turning the vector 
X into the matrix X, by first filling the rows of X, and then computing the 
Euclidean norm of Xvi. In other words, 

fi{x) = \\XVi\\2. 

This function is clearly convex and 1-Lipschitz with respect to the Euclidean 
norm. As a matter of fact, for 9 G [0, 1] and x, z G M"^, 

fiiox + (1 - 6)z) = \\{ex + (1 - e)z)vi\\2 < \\exv.,h + ll(i - o)Zvi\\2 
= ef,{x) + {i-e)f,{z). 
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Similarly, 

|/,(x) - n{z)\ = |||Xt;,||2 - WZvihl <\\{X- Z)vi\\2 < \\X - ZMvih 
= \\x - ^h, 

using the Cauchy-Schwarz inequality and the fact that \\vi\\2 = {G'G){i,i) = 
^(^,^) = 1. The same is true when X is replaced by TCn- 

Because the TCn{i,j) are independent and bounded, we can apply re- 
cent results concerning concentration of measure of convex Lipschitz func- 
tions. In particular, from Corollary 4.10 in [33] (a consequence of Talagrand's 
inequality — see [46] and Theorem 4.6 in [33]), we see that for any r > 0, we 
have, if rrif. is a median of fi{TCn), 

P{\h{TCn) - m/J > r) < 4exp(-rV(16C2n/(logn)(i+^))). 



In particular, since ^JWp{i,i) = f{TCn)/^/n, letting mj^j denote a median 
of fi{TCn)/ \/n, we see that 



P{\\JWp{i,i)-mu\>r)<4exp{-r^{logn)^^+^yi6C^). 

Finally, 



P[jmax\^JWp{i,i) - mi^i\\ >rj < 4pexp(-r2(logn)(^+^V(16C2))> 
so, since p^n, using the first Borel-Cantelli lemma, we see that 



msx\\ Wr,(i,i) — rriiil ^ a-S. 
i<«<p V ' 

All we have to do now is to show that the mj^j are all close to 1. We let 
Vn = var(rC„(z, j)). Note that Vn is independent of i,j and that t;„ — > 1 as 
n oo. 

Since we have Gaussian concentration, using Proposition 1.9 in [33], we 
have 



E{JWp{i,i)) - m^,i\ <8CV^{logn) 



-{l+e)/2 



and since Ei{Wp{i,i)) = ||fi||2t^n = TC^iO'^n = '^n, the variance inequality in 
the same proposition gives 

64C2 



0<Vn-E{JWp{i,i)f< 



(logn)(i+^)' 
Consequently, 



8CJ^ 64C2 8CJ¥ 

<mi.i<Vn + 



(logn)(i+^)/2 ' Y (logn)(i+^) - " - " (logn)(i+-)/2 
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Therefore, maxj {rrii^i — 1| = 0(max(|l - (logn)^(^+^)/2)^ g^^^ 

we have 



max. \ \ W„(i,i) — 1\ ^ a.s. 
l<i<p V ' 

We can therefore conclude that we also have 



max U/5r,(i, — 1| ^ a.s. r-i 

i<«<p V ' ' u 

We now turn to the more interesting situation of a covariance matrix. 

Lemma 5 (Covariance matrix). We now focus on the matrix 

Sp = ^{Y2-Y2y{Y2-Y2). 
n — 1 

For this matrix, we also have 



max U/Sofi, i) — 1| — > a.s. 

l<i<p V ' ' 

Proof. As before, we let Wp denote the equivalent of Sp computed by 
replacing X by TCn- 

Note that Y2-Y2 = (Id„ -ill')>2 = (Id„ -^n')XG. Now, Sp{i, i) = vix 
X'{Idn-^U')Xvi/{n-l), so the same strategy as above can be employed, 
with / now defined as 



fix) = f{X) 



idn--n']xv^ 

n 



This function is again a convex 1-Lipschitz function of x. Convexity is a sim- 
ple consequence of the fact that norms are convex; the Lipschitz coefficient 
is equal to ||wj||2||| Id„, — ;ill'|||2. The eigenvalues of the matrix Id„ — ^H' are 
(n — 1) ones and one zero. Its operator norm is therefore 1. We therefore 
have Gaussian concentration when replacing X by TCn- Also, we have the 
same bounds as before on maxj \ Sp{i,i) — Wp{i,i)\. All we need to check to 
conclude the proof is that E(VFp(i,i)) — > 1. By renormalizing by Xj^Jn — 1, 
we ensure that E(VFp(i,i)) = Vn and so, as before, the proof is complete. □ 

2.3.4. A remark on\l\{X — X)' [X — X) ln—\\l\2. We now turn to provid- 
ing a justification for Corollary 1. This amounts to understanding the behav- 
ior of the largest eigenvalue of {Y — Y)' — y)/n — 1, which differs slightly 
from what is usually investigated in the literature, namely Sp = Y'Y jn^ if, 
say, Y is assumed to have mean zero entries. 

Since in (statistical) practice 5p = (1" — ?)'(!" — F)/(n — 1) is almost 
always used, it is of interest to know what happens for this matrix in terms 
of largest eigenvalue. Note that IHS'p — 5p|||2 does not go to zero in general, 
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SO a coarse bound of the type \Xi{Sp) — Ai(5p)| < \\\Sp — Sp\\\2 is not enough 

to determine the behavior of Xi{Sp) from that of Xi{Sp). 
However, letting Hn = Idn — ^ll', we see that 

Y-Y = HnY. 

Therefore, since cii, the largest singular value, is a matrix norm, we have 

ai{Y - Y)/V^ < ai{Hn)ai{Y/y/^) = ai{Y/y/^) 

since Hn is a symmetric matrix with (n — 1) eigenvalues equal to 1 and one 
eigenvalue equal to 0. 

Now, because Y'Y/n and {Y — Y)'{Y — Y)/n — 1 have, asymptotically, 
the same spectral distribution, letting li denote the right endpoint of the 
support of this limiting distributions (if it exists), we conclude that 

liminf (T?((y - y)/v/^5^) > /i. 

Hence, when |||yy/n|||2 — > h, we also have 

\\\(^Y-YyiY-Y)/{n-l)\h^h. 

This justifies the assertion made in Corollary 1 and, more generally, the fact 
that when the norm of a sample covariance matrix which is not re-centered 
(whose entries have mean 0) converges to the right endpoint of the support 
of its limiting spectral distribution, so does the norm of the centered sample 
covariance matrix. 

Finally, we note that when dealing with Sp, the mean of the entries of Y 
does not matter, so we can assume without loss of generality that it is 0. 

2.3.5. Proof of Theorem 1. We now put all of the elements together and 
give the proof of Theorem 1. 

Proof of Theorem 1. As noted in Section 2.2.1, corr(y) = corr(y2)- 
So, to understand the spectral properties of corr(y), it is enough to study 
those of corr(y2). 

Let Sp = {Y2 - Y2)'{Y2 - Y2)/{n - 1) and Ds^ = diag(5p). We have seen 
in Lemma 5 that 

lll-Dsp -Idp 1112^0 a.s. 
Now, using the remark made in Section 2.3.4, and [50], we have 

\\\{X - X)'{X - X)/{n - l)\\\2 - (1 + a.s. 
Since |||rp|||2 is bounded, we have 

IllSpllb < llirpllbllKx - xy(x - x)/(n - 1)|||2. 
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Therefore, |||5'p|||2 is a.s. bounded and the assumptions of Lemma 1 are ver- 
ified. 

Using Fact 1 and Lemma 1, we therefore have 

|||Sp-corr(y2)|||2^0 a.s. 

Hence, 

|||5'p-corr(y)|||2^0 a.s. 

FinaUy, as explained in Section 2.3.1, the spectral properties of Sp, when 
they involve only the spectral distribution of G'G, are the same as those of 

(yi-yiy(yi-yi)/(n-i). □ 

3. EUiptically distributed data and generalizations. We now turn our 
attention to the problem of finding a Marcenko-Pastur-type system of equa- 
tions to characterize the limiting spectral distribution of sample covariance 
matrices computed from elliptically distributed data and generalizations of 
these distributions. Our aim in doing so is manifold. From a statistical stand- 
point, one issue is to try and explain the lack of robustness in high dimen- 
sions of this estimate of scatter and to explain some of the numerical findings 
highlighted in [20]. From a more mathematical point of view, elliptical dis- 
tributions raise a question concerning data vectors with a somewhat more 
complicated dependence structure than is usually investigated in random 
matrix theory. Their study will therefore force us to confront this difficulty 
and show that our tools allow a generalization of the results beyond elliptical 
distributions (and classical models). 

Elliptical distributions are considered to be good models for financial data. 
We refer to [20] and to the book [37] for interesting discussions of the poten- 
tial relevance of elliptical distributions to problems arising in the analysis of 
this type of data. Naturally, the study of corresponding covariance matrices 
is relevant to problems of portfolio optimization, where sample covariance 
matrices are used to estimate the covariance matrix between assets. This 
latter matrix is key in these problems since the optimal portfolio weights 
depend on the assets' covariance matrix in many formulations. Let us men- 
tion two other properties that make elliptical distributions appealing in the 
financial modeling context. First, we have the tail-dependence properties 
that they induce between components of data vectors, something that, in 
practice, is found in financial data and cannot be accounted for by, say, mul- 
tivariate Gaussian data. Second, at least some of these distributions allow 
for a certain amount of "heavy-tailedness" in the observations. This is often 
mentioned as an important feature in modeling financial data. By contrast, 
it is sometimes advocated in the random matrix community that matrices 
with, say, i.i.d. heavy-tailed entries should be studied as models for those 
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financial data and, in particular, returns or log-returns of stocks. We find 
that these more simple heavy-tail models suffer at least from one deep flaw: 
in the case of a crash, many companies or stocks suffer on the same day 
and a model of i.i.d. heavy-tailed entries does not account for this, whereas 
models based on elliptical distributions can. Besides the particulars of dif- 
ferent models, it is also important to notice that the limiting spectra will be 
drastically different under the two types of models and the behavior of ex- 
treme eigenvalues is also very likely to be so. Before we return to our study, 
we refer the reader to [2] and [18] for thorough introductions to elliptical 
distributions. 

We are therefore particularly interested in problems where we observe n 
i.i.d. observations of an elliptically distributed vector v in W^. Specifically, 
V can be written as 

V = ^1 + AFr, 

where /u is a deterministic d-dimensional vector, A is a real- valued random 
variable, r is uniformly distributed on the unit sphere in W (i.e., ||r||2 = 1) 
and r is a d X p matrix. We let S = TV . Here, S, a dx d matrix, is assumed 
to be deterministic, and A and r are independent. We call the corresponding 
data matrix X, which is n x d, that is, the vectors of observations are stacked 
horizontally in this matrix. Below, we will assume that n/p and d/p have 
finite limits. 

As it turns out, the models for r which we can handle allow for more 
complicated dependence structures than the one induced by a uniform dis- 
tribution on the unit sphere in W. So, for r, we will focus on random vectors 
whose distribution satisfies certain concentration of measure properties. For 
more details, we refer the reader to Section 3.2. 

Note that when studying the limiting spectral distribution of a properly 
scaled version of X'X/n, we can, without loss of generality, assume that 
H = and E(r) = 0. As a matter of fact, if we define X to be the data 
matrix obtained by replacing Vi by XiT{ri — E(rj)), it is clear that X'X 
is a finite rank perturbation of X'X and hence properly scaled versions 
of these matrices have the same limiting spectral distributions. Also, since 
{X — Xy{X — X) is a rank one perturbation of X'X, we see that, after 
proper scaling, it has the same limiting spectral distribution as the properly 
scaled X'X . In what follows, we will therefore assume that 

Vi = XiT{ri-E{ri)), i = l,...,n. 

As is now classical, we will obtain our main result on the question of 
characterizing the limiting spectral distribution of a properly scaled version 
of {X — Xy{X — X) (Theorem 2) by making use of Stieltjes transform argu- 
ments. If needed, we refer the reader to [22] for background on the connec- 
tion between weak convergence of distributions and pointwise convergence 
of Stieltjes transforms. 
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We note that our model basically falls into the class of covariance matri- 
ces of the type Tp^"^ X* pLnXn,pTp^'^ , where Xn,p is a random matrix, inde- 
pendent of the square matrices Tp (p x p) and L„ (n x n), which can also 
be assumed to be random, as long as their spectral distributions converge 
to a limit. These matrices have been the subject of investigations already: 
see [47] Theorem 2.43, which refers to [9, 34] and [23] and the recent [41], 
which refers to [12] and to [51] for systems of equations involving Stieltjes 
transforms similar to the one we will derive. We note that under some dis- 
tributional restrictions, methods of free probability using the S-transform 
(see [49]) could be used to derive a characterization of the limit. 

However, in all of these papers, the entries of Xn^p are assumed to be in- 
dependent. Naturally, this is not the case in the situation we are considering 
and, as we make clear below, the dependence structures of the models we 
are considering can be quite complicated, as the example of the Gaussian 
copula (see below) indicates. Similarly, elliptically distributed data have a 
certain amount of dependence in the entries of the vector r since its norm is 
1. (We note that [35] allowed for dependence, too, and one of our questions 
was whether one could recover (and generalize) those results from a different 
angle than the one taken in [35].) Also, our matrix T is d x p and usually 
only square matrices are considered. Interestingly, the result shows that the 
ratio d/p plays a nontrivial role in the limiting spectral measure. One of 
our aims here is to show that independence in the entries of X^^p is not 
the key element; rather, we will rely on the fact that the rows of Xn^p are 
independent and that the distribution of the corresponding vectors satisfies 
certain concentration properties. 

As our proof will make clear, using the "rank one perturbation" method 
originally proposed in [45] and [44], proofs of convergence of spectra of ran- 
dom matrices basically boil down to concentration of certain quadratic forms 
and concentration of Stieltjes transforms, the latter being achievable using 
Azuma's inequality. We discuss these two aspects in Section 3.2 and Section 
3.1, respectively. We chose to separate the results of these two subsections 
from the main proof because we believe that the results are of interest in 
their own right and that their technical nature would obscure the proof of 
Theorem 2 if they were treated there. As far as we know, many results cov- 
ered by Theorem 2 are new and cannot be achieved with other methods 
involving (in one way or another) moment computations. 

One of our points is that the importance of concentration inequalities 
in this context appears not to have been realized and that they permit 
generalizations of random matrix results to problems that look intractable 
by other methods. 

3.1. Concentration of Stieltjes transforms. We present a result of inde- 
pendent interest, namely, the fact that the Stieltjes transform of a matrix 
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which is the sum of n independent rank one matrices is asymptoticahy equiv- 
alent to a deterministic function. We have somewhat more than this: we show 
concentration around its mean, which also immediately gives us some lower 
bounds on the rate of convergence. 

Naturally, the result (and its extension. Remark 1) is needed in our proof 
(see page 37), which is why it is included here. Another reason to highlight it 
is the fact that it shows that certain existing results that have been obtained 
with "only" convergence in probability actually hold almost surely, by simply 
using the Borel-Cantelli lemma and the following lemma. Finally, it answers 
some practical questions raised in [17], which relied on Stieltjes transforms 
to perform spectral estimation in connection with random matrix results. 

Lemma 6 (Concentration of Stieltjes transforms). Suppose that M is a 
p X p matrix such that 

n 

M = Y,rir*, 

i=l 

where ri are independent random vectors in W . Let 

m„(z) = - trace((M — zldp)~^). 
p 

Then, iflm[z] =v, 

P{\mp{z) - E(mp(z))| > r) < 4exp(-rV^'^/(16n)). 

Note that the lemma makes no assumptions whatsoever about the struc- 
ture of the vectors {rj}"^^ other than the fact that they are independent. 

Proof of Lemma 6. We define = M — r^rl. We let JFj denote the 
filtration generated by {ri}l^^. The first classical step (see [3], page 649) is 
to write the random variable of interest as sum of martingale differences: 

n 

mp{z) - B{mp{z)) = ^ B{mpiz)\J^k) - Bimp{z)\J^k~i)- 
k=l 

We now note that E(trace((Mfc - zldp)-^)!:^;^) = E(trace((Mfc-zIdp)-i)| 
Tk-i)- So, 

\B{mp{z)\Tk) -'E{mp{z)\Tk-i)\ 

= E(mp(z)|.Ffc) -EQtrace((Mfe -zldp)-^)|j-fc) 

+ eQ trace((iV4 - zldpy')\Tk^i^ - B{mp{z)\J^k~,) 
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< 



E( mp{z) trace((Mfc — zldp) 



+ 



1 



E( mp{z) - -trace((Mfc - zldp)-') J^t-i 



2 

<— , 
pv 

the last inequality following from [45], Lemma 2.6. So, mp{z) — 'E{mp{z)) is 
a sum of bounded martingale differences. Note that the same would be true 
for its real and imaginary parts. For both of them, we can apply Azuma's 
inequality (see [33], Lemma 4.1) to get that 

P(| Re[mp(2; ) - E(mp(z))]| > r) < 2exp(-rV^'V(8n)), 

and similarly for its imaginary part. We therefore conclude that 

Pi\mp{z) - B{mp{z))\ > r) < P{\ Re[mp(z) - E(mp(z))]| > r/V2) 

+ P{\ lm[mp{z) - E{mp{z))]\ > r/V2) 

<4exp(-rV^^V(16n))- 



□ 



We have the following, immediate, corollary. 



Corollary 2. Suppose that we consider the following sequence of ran- 
dom matrices: for each p, select n independent p- dimensional vectors. Let 
M = Y^^=ifir* . Assume thatp/n remains bounded away from 0. Then, 



?np(2;) — E(mp(z)) — > a.s. 



and also 



VP 



(logp)(i+")/2 '""^^^^ ~ ^ a.s.,fora>Q. 

In other words, mp{z) is asymptotically deterministic. 

Proof. The proof is an immediate consequence of the first Borel-Cantelli 
lemma. □ 

Remark 1 . We note that if S is a matrix independent of the , similar 
results would apply to 



1 



^trace((M- zIdp)"^S'^ 

after we replace v by t;/|||S|||2. In particular, if |||S|||2 < C(logp)™ for some 
m, we have 



trace((M - zIdp)-^S') - e( - trace((M - zldpV^^'' 







a.s. 
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However, the rate in the second part of the previous corollary needs to be 
adjusted. 

Remark 2. We note that the rate given by Azuma's inequality does 
not match the rate that appears in results concerning fluctuation behavior 
of linear spectral statistics, which is n and not ^/n. Of course, our result 
encompasses many situations that are not covered by the currently avail- 
able results on linear spectral statistics, which might help to explain this 
discrepancy. The "correct" rate can be recovered using ideas similar to the 
ones discussed in [26] and [33], Chapter 8, Section 5. As a matter of fact, 
if we consider the Stieltjes transform of the measure that puts mass 1 /p at 
each of the singular values of M = X*X/n, it is an easy exercise to see that 
this function (of X) is l/(y'npi;^)-Lipschitz with respect to the Euclidean 
(or Frobenius) norm. Hence, if the np-dimensional vector made up of the 
entries of X has a distribution that satisfies a dimension-free concentration 
property with respect to the Euclidean norm, we find that the fluctuations 
of the Stieltjes transform at z are of order y/np, which corresponds to the 
"correct" rate found in the analysis of these models. (Note, however, that re- 
sults have been shown beyond the case of distributions with dimension-free 
concentration.) 

The conclusion of this discussion is that since the spectral distribution 
of random matrices is characterized by their Stieltjes transforms, it is not 
surprising that they are asymptotically nonrandom for a very wide class of 
data matrices of covariance type. We now turn to the examination of another 
type of concentration which we will crucially need in our proof, namely the 
concentration of certain quadratic forms. 

3.2. Concentration of quadratic forms. The key property we will rely on 
in the proof of our main theorem is a concentration property of quadratic 
forms. This property is summarized in Corollary 4 and we now give impor- 
tant sufficient conditions to reach it. 

Lemma 7 (Case of Gaussian concentration). Suppose that the random 
vector r £W has the property that for any convex 1-Lipschitz (with respect 
to the Euclidean norm) function F from MP to M, we have, if mp denotes a 
median of F{r), 

P{\F{r) - mF\ >t)< Cexp(-c(p)t2), 

where C and c{p) are independent of F and C is independent of p. We allow 
c{p) to be a constant or to go to zero with p like < a < 1. Suppose, 
further, that E(r) = 0, E(rr*) = S, with |||S|||2 < log(p). 



22 



N. EL KAROUI 



If M is a complex deterministic matrix such that |||M|||2 < ^, where ^ is 
independent of p, then 

1 1 

—r Mr is strongly concentrated around its mean, — trace(MS). 
p p 

In particular, if, for e > 0, tp{e) = log(p)^"'~^/ \/pc{p), then 



log P 



-r'Mr - - trace (MS) 
p p 



>tp(e) x-(logp) 



l + 2£ 



IfE{r) 7^ 0, then the same results are true when one replaces r by r — E(r) 
everywhere and S is the covariance of r. 

Finally, if ^ is allowed to vary with p, then the same results hold when 
one replaces tp{e) by tp{e)S, or, equivalently, divides M by ^. 

Proof. In what follows, K denotes a generic constant that may change 
from occurrence to occurrence, but which is independent of p. First, it is 
clear that we can rewrite M as M = RM + ilM, where RM and IM are 
real matrices. Further, the spectral norm of those matrices is less than ^ [of 
course, RM = (M + Mi)/2, where Mi is the (entrywise) complex conjugate 
of M]. 

Now, strong concentration for r' RMr/p and r'IMr/p will imply strong 
concentration for the sum of those two terms. We note that since r'RMr is 
real, r'RMr = {r'RMr)' and 



1^.. ,fRM + RM'\ 
r'RMr = r' i j 



r. 



Hence, instead of working on RM, we can work on its symmetrized version. 

Let us now decompose {RM + RM') /2 into RM+ + i?M_ , where RM+ is 
positive semidefinite and —RM^ is positive definite [or if {RM + RM')/2 
itself is positive semidefinite]. This is possible because {RM + RM')/2 is 
real symmetric and we carry out this decomposition by simply following its 
spectral decomposition. Note that both matrices have spectral norm less 
than ^. Now, the map (j):r ^ \/ r' RMj^r j p is \/^/ p-Lipschitz (with respect 
to the Euclidean norm) and convex, which is easily seen after one notes that 

\Jr' RMj^r Ip = \\RM^ r/^\\2. This guarantees, by our assumption, that 



P{\^Jr'RM+r/p-m^\ > t) <Cexp{-pc{p)tyC), 

where is a median of (j){r). 

Now, using Proposition 1.9 in [33], letting fi^ denote the mean of (j){r) 
and observing that var((/>(r)) = E{r' RM^r / p) — /i^, we have 



|/.,-m,|<^W-^ and 
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C 



0<'E{r'RM+r/p) - fi^^< 



pc{p) 



We hence deduce, using the fact that y/a + b < y/a + \/h for nonnegative 
reals, that 

We also have, trivially, 

^/ , N trace(i?M+S) 

F,(r'RM+r p) = ^ i-^ 

p 

since E(r) = 0. Therefore, we have that for u > 0, 

.2 



P{\^^r'RM+r/p - •y/trace(i?M+S/p)| >u + Kp)< C exp{-pc{p)uy^). 
On the other hand, if < t < Kp, then we have 



P{\^^r'RM+r/p- ^^trace{RM+T,/p)\ > t) 

< 1 < exp(pc(p)Kp/Oexp(-pc(p)(t - 

< max(C,exp(pc(p)Kp/C))exp(-pc(p)(t -Kpf/C)- 
Since pc{p)n'i = (Cy^/2 + \fC-)'^ , we conclude that for any t > 0, 



P{\^r'RM+r/p - ■y/trace(i?M+S/p)| > t) < K exp{-pc{p){t - KpY /C), 

where K depends on C and ^, but not on p. Note that — > since pc(p) — > 
oo and C does not depend on p. 

We now turn to finding a deviation inequality for the quadratic form of 
interest. Let us define 

C,p = trace(i?M4.S/p), 

A = {\r'RM+r/p-Q>t}, 



B = {^r'RM+r/p<^Cp + l}. 

Our aim is to show that the probability of A is "exponentially small" in 
p. Of course, we have P{A) < P{A n B) + We note that is 

"exponentially small" in p since 



P{B'') = P{^r'RM+r/p-^iva.ce{RM+T,/p) > 1) < exp(pc(p)(l - Kp)VO 
and pc{p) — > oo, at least as fast as with a > 0. Now, note that 

^ n i? C D = 1 1 ^r'RM+r/p - y^l > ^ J . 
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To see this, note simply that for positive reals, |x — y| = \^/x — ^/y\{^/x + 
^/y). Finally, because of our bounds on the norm of S and the fact that 
III-RM+III2 < we see that trace (i?M+E/p) = Cp < log(p)?. Hence, P{D) < 
Kexp(— pc(p)(t/(2-y(p + 1) — i^pf' /i) for some K independent of p and we 
therefore have 

P{A) < K[exp(-pc(p)(V(2yCp + 1) - A^p)Ve) + exp(-pc(p)(l - K^^f/O] 

= gp{t)- 

Similarly, we can obtain the same type of bounds for —r' RM-r /p. From 
those, we conclude that 

P{\r'RMr/p - trace(i?MEp)/p| > t) < 2gp{t/2). 

Finally, 

P{\r'Adr/p - trace(MSp)/p| > t) < Agp{t/2V2). 

From the way gp{t) behaves, we obtain the result concerning strong con- 
centration. Now, studying the asymptotics of gp{tp{e)) for large p gives the 
statement concerning the log probability in the lemma. 

The last statement in the lemma, concerning the replacement of r by 
r — E(r) if E(r) 7^ 0, follows simply from the fact that, for any given /i, the 
map (p{r) = \/ {r — fi)'T,{r — fi) jp is convex and i/^/p-Lipschitz since the 
composition of a convex mapping and an affine one is convex. (See, e.g.. 
Section 3.2.2 in [10].) □ 

Motivated by examples we will see below, we also note that we have the 
following corollary, which is applicable when concentration is not limited to 
convex 1-Lipschitz functions, but holds for any 1-Lipshitz function. 

Corollary 3 (Gaussian concentration of nonconvex functionals). Sup- 
pose that the random vector v has the property that for any 1-Lipschitz 
(with respect to the Euclidean norm) function F from to M, we have, if 
mp denotes a median of F{v), 

P{\F{v) - mp] >t)< Cexp(-c(p)t2), 

where C and c{p) are independent of F and C is independent of p. We allow 
c{p) to be a constant or to go to zero with p like p~'^ , < a < 1. 

Consider r = ^{v), where ^ is a 1-Lipschitz map from to M'^, also with 
respect to the Euclidean norm. Suppose, as above, that E(r) = 0, E(rr*) = S, 
with |||S|||2 < log(p). 

If M is a complex deterministic matrix such that |||M|||2 < where ^ is 
independent of p, then 

—r'Mr is strongly concentrated around its mean, — trace(MS). 
P P 
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In particular, if for e > 0, tp{e) = log{py~^^ / \/pc{p), then 



log P 



—r'Mr 
p 



- trace(MS) 
p 



> tp{e) 



-(logp) 



l+2e 



IfE{r) 7^ 0, then the same is true when one replaces r by {r — E(r)) every- 
where and T, is the covariance of r. Finally, if ^ is allowed to vary with p, 
then the same results hold when one replaces tp{e) by tp(e)C or, equivalently, 
divides M by ^. 



Proof. The proof follows easily from the arguments developed for Lemma 
7 after we note that the map (p-.EP R with 4){v) = y/r'RM^r /p = 
y/ ^{v)' RM^^{v) /p is ^^/p-Lipschitz with respect to the Euclidean norm. 
The concentration properties of v can then be invoked and the proof follows 
along similar lines as above. □ 



For applications, it is important to extend the results beyond Gaussian 
concentration. We therefore state the following lemma. 



Lemma 8 (Beyond Gaussian concentration). Suppose that the random 
vector r £ M-P has the property that for any convex 1-Lipschitz (with respect 
to the Euclidean norm) function F from W to M, we have, for b> inde- 
pendent of p and nip a median of F, 

P{\F{r) - mpl >t)< Cexp{-c{p)t^), 

where C and c{p) are independent of F and C is independent of p. We allow 
c{p) to be a constant or to go to zero with p like p~°', < q < 6/2. Suppose, 
further, that E(r) = 0, E(rr*) = S, with |||S|||2 < log(p). 

If M is a complex deterministic matrix such that |||-M|||2 < ^, where ^ is 
independent of p, then 

1 1 

—r'Mr is strongly concentrated around its mean, — trace(MS). 
p p 



In particular, if, for e >0, tp{e) = \og{pY/'^^^/^^^ / \J pc^/^{p), then 



log P 



-r'Mr - -trace(MS) 
p p 



>tp(e))|x-(logp) 



l+he 



If^{r) 7^ 0, then the same is true when one replaces r by (r — E(r)) every- 
where and S is the covariance of r. 

Finally, if ^ is allowed to vary with p, then the same results hold when 
one replaces tp{e) by tp{e)^ or, equivalently, divides M by ^. 
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Proof. We only give a sketch of the proof. The ideas are exactly the 
same as those above. However, when studying the concentration of \Jr'RMj^r/p, 
the exponent of the exponential is, to leading order, p^^'^c{p){t — Kp)^. We 
note that Kp will be somewhat different in its form than it was in the Gaus- 
sian concentration case. This comes from the fact, following the analysis in 
Proposition 1.9 of [33], that the inequalities we now have, if fip denotes the 
mean of F, are 

var(F)<^ry, 

where F denotes the Gamma function. With this adjustment, the previous 
proof proves the present lemma. □ 

A corollary similar to Corollary 3 holds for the variant of Lemma 8 where 
concentration is not limited to convex 1-Lipschitz functionals, but is valid 
for any 1-Lipschitz (with respect to the Euclidean norm) functional. 

Examples of distributions for which the previous results apply. 

• Gaussian random variables with |||S|||2 < log(p). Lemma 7 and Corollary 
3 apply, according to [33], Theorem 2.7, with c{p) = 1/|||I1|||2. 

• Vectors of the type y/pr, where r is uniformly distributed on the unit 
(£2-) sphere in dimension p. Theorem 2.3 in [33] shows that Lemma 7 
(and Corollary 3) applies, with c{p) = (1 — l/p)/2, after noting that a 1- 
Lipschitz function with respect to the Euclidean norm is also 1-Lipschitz 
with respect to the geodesic distance on the sphere. As we will see below, 
this will allow us to treat the case of elliptically distributed data. 

• Vectors T^Jpr, with r uniformly distributed on the unit (^2-) sphere in W 
and with PF' = S having the characteristics explained in Lemma 7. 

• Vectors of the type p^/^r, 1 < 6 < 2, where r is uniformly distributed in 
the unit ball or sphere in M^. (See [33], Theorem 4.21, which refers to 
[43] as the source of the theorem.) Lemma 8 applies to them, with c{p) 
depending only on h. 

• Vectors with log-concave density of the type e~^^^\ with the Hessian of 
U satisfying, for all x, Hess(C/) > cidp, where c> has the characteristics 
of c{p) in Lemma 7; see [33], Theorem 2.7. Here, we also need |||S|||2 to 
satisfy the assumptions of Lemma 7. Corollary 3 also applies here. 

• Vectors (r) distributed according to a (centered) Gaussian copula, with 
corresponding correlation matrix S such that |||S|||2 is bounded. Here, 
we can apply Corollary 3 since if f has a Gaussian copula distribution, 
then its iih. entry satisfies fj = ^(uj), where v is multivariate normal with 
covariance matrix S, S being a correlation matrix, that is, its diagonal is 
1. Here, $ is the cumulative distribution function of a standard normal 
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distribution, which is triviaUy Lipschitz. Now, taking r = r — 1/2 gives a 
centered Gaussian copula. The fact that the covariance matrix of r then 
has bounded operator norm requires a little work and is shown in the 
Appendix. 

• Vectors with i.i.d. entries bounded by l/-\/c(p). See Corollary 4.10 in [33] 
for the concentration part, which shows that Lemma 7 apphes. We cru- 
ciaUy need the fact that the concentration of measure result is valid "only" 
for convex 1-Lipschitz functions. As we will explain below, in our main 
theorem, this result will enable us to work with random variables with 
bounded second moment since using an argument similar to those in [45], 
those random variables can be truncated at log(p) without (a.s.) changing 
the limiting spectral distribution. 

It also appears possible to use this method to treat more "exotic" ex- 
amples involving vectors sampled uniformly from certain Riemannian sub- 
manifolds of M^, a question which is sometimes of interest in multivariate 
statistics and computer science. We refer to [33], Theorems 2.4 and 3.1, for 
the concentration aspects of these questions. 

We also have the following, important, corollary that will play a key role 
in the proof of Theorem 2. 



Corollary 4. Suppose that {ri}'2=i are independent random vectors 
whose distributions satisfy the hypotheses of Lemmas 7 or 8, or Corollary 3. 
Suppose that n^p. Suppose that Mi are random matrices, Mi being inde- 
pendent of ri and such that \\\Mi\\\2 < K, where K is nonrandom. Suppose, 
further, that for some matrix M, some Kp with Kp = 0{K/p) and all Her- 
mitian matrices A, 



Vi 



-trace(Mj^) - -tvace{MA) 
p p 



< 



2Kp. 



Then, for any e > 0, if fi 
(1) 



pc^l'^{p) 



■ ri 



max 



E(rj), we have 



(logp)(V2+l/fe+e)_^ i=l,...,„ 



-f'Mifi - -trace(A4S) 
p p 



a.s. 



Proof. From the previous results, we have 



P^max 



-f'iMifi 



1 
P 



■ trace(MjS^ 



>t 



P 



i=l 

< 4ngp{t/2V2) 



f'iMifi - -trace(MiS) 



p 



>t 
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by conditioning on Mj to compute each probabihty in the sum. Therefore, 
using the first Borel-Cantehi lemma, the results above and the fact that 
n X p, we have 



max 



(logp)(i/2+i/b+£)K 



-f'Mifi - -tracefMjS^ 
P P 



a.s. 



and because ||trace(MjS) — itrace(A^S)| <JCp|||S|||2 < Kp\og{p), we con- 
clude that 



■ max 



-f'Mih - -trace(A^S^ 
P P 



a.s. ^ 



(logp)(V2+l/6+£)i^ i 

We also have the following technical result that will be useful below. 

Corollary 5. We assume that Lemma 7, Lemma 8 or Corollary 3 
applies and that ticace{'E)/p is bounded by K independent of p. If {ri}2^i is a 
triangular array of independent random variables and n/p remains bounded, 
then the spectral distribution of An = J2i'=irir*/n is a.s. tight. 

Proof. We first assume that E(rj) = 0. Let i?„ denote the matrix whose 
ith. row is r|. We consider the first moment of the spectral distribution of 
An = RnRn/n, which is equal to Mi, with Mi = l/nX]r=iti'ace(rjr*/p). Its 
mean is trace(S)/p. As we just saw, r*ri/p is strongly concentrated around 
trace(S)/p(= E(Mi)) and this property transfers to Mi using the fact that 
P(|Mi -E(Mi)| >t)<nP{\r*ri/p-¥,{r*ri/p)\ >t). Because trace(S)/p is 
assumed to be bounded, we see that Mi is a.s. bounded by K + 1. Because 
it is the first moment of the spectral distribution of An, we conclude that, 
if we let F"^" denote the c.d.f. of the spectral distribution of An, we have 
a.s. F^"-{\M, oo)) < {K + 1)/M for n sufficiently large. Since the spectral 
distribution of An is supported on [0,oo), we conclude that it is a.s. tight. 

In the case where rj do not have mean 0, we can work with fj = — E(rj). 
The resulting matrix Rn is a perturbation of Rn of rank at most 3 and 
therefore generates the same limiting spectral distribution as that generated 
by Rn- Therefore, the previous arguments applied to Rn give the result for 

Rn. □ 



We conclude this concentration discussion by considering some practical 
consequences. 

Practical geometric consequences of concentration. If we apply the pre- 
vious results to the matrix M = Idp, we see that our concentration re- 
sults indicate that if E(r) = 0, then is strongly concentrated around 
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trace(Sj,)/p. In particular, if n x p, we see that maxj=i^,,,^„ | ||r|p/p — trace(Sp)/p| 
will tend to a.s. Hence, the vectors Vi/ ^fp appear to be located close to a 
sphere. By properly choosing M, for instance, as a block matrix with on 
the block diagonal and Idp off the diagonal, we can also show (see [15] for 
details) that maxj^j \T[rj/p\ tends to a.s. and hence so does the maximal 
angle between two such vectors. So, perhaps surprisingly, the vectors rj ap- 
pear to be almost orthogonal to one another. These remarks suggest that 
although the models we study are quite general, their geometric properties 
are somewhat peculiar. Hence, one should probably check (even just graph- 
ically) whether such geometric features are present in the data before using 
random matrix results. 

3.3. Marcenko-Patur-type system for covariance matrices computed from 
generalized elliptically distributed data. We refer the reader to the discus- 
sion introducing Section 3 for a review of literature concerning elliptically 
distributed data and some motivation for the theorem that follows. In what 
follows, we assume that we have a triangular "array" of random variables, 
where the nth line contains n i.i.d. Aj's and n i.i.d. rj's in W satisfying con- 
centration inequalities as in Lemma 7, Lemma 8 or Corollary 3. We assume 
that the r^'s have covariance matrix S such that |||S|||2 < log(p). We also 
have to work with a d x p matrix T. The data vectors we will focus on are 
therefore the "array" of 

Vi = n + XiVri, 

which we say have generalized elliptical distributions. In what follows, we 
allow S = TV to be random, as long as it is independent of the vectors rj. 
For all practical purposes, however, S can be considered deterministic. 

We present the theorem in the form that makes it most natural for ellip- 
tically distributed data, our original motivation. 

Theorem 2. Let {{^^iliLil^i form a triangular array of independent 
random vectors, "generalized- elliptically" distributed, as described above. In 
particular, recall that they are in . 

• Define On = d/p, pn = p/n, = d^/np = e^pn. 

• Let Gd denote the spectral distribution of TV = S, the spectral distri- 
bution of TTiV = T (S and T are d x d) and Vn the spectral distribution 
of the diagonal matrix containing the Aj 's. 

• Assume that converges weakly a.s. to a probability distribution H ^0. 
Assume, further, that J rdHdir) remains bounded. 

• Assume that converges weakly a.s. to a probability distribution G^^O. 

• Assume that m„ converges weakly a.s. to a probability distribution v ^0. 



30 



N. EL KAROUI 



Let X denote the nx d data matrix whose ith row is Vi. Consider the matrix 

, a " 

Bn = ^ ^=—2^ ViVi = > UiUi- 

vn n ^ ^ 

^ 1=1 1=1 

If pn has a finite nonzero limit p and On has a finite nonzero limit 9, then 
obviously has a finite 710TIZ6TO liTTlit ^ dTld thc SticltjcS tTQTlsfoTTTl of 
ran, converges a.s. to a deterministic limit m satisfying the equations 

l,[ dHir) , 

""^^^"7 Tfe\'^/{l + iX^w{z))dv{X)-z """"^ 



w{z) 



TdH{T) 



Tfe\'^/{l + ^\'^w{z))dv{\)-z 

w is the unique solution of this equation mapping C'^ into C^. (The intuitive 
meaning of w is explained below. We also remind the reader that m uniquely 
characterizes the limiting spectral distribution of B^-) 
We note, further, that we have 

1 + zm{z) = w{z) / — r — —du{X). 



l + ^A2w(z) 

The same results hold for the scaled sample covariance matrix d/p{X — 
X)\X-X)/n since it is a finite-rank perturbation of B^. 

The conclusion is that the limiting spectral distribution of B^ is a non- 
random probability measure and is uniquely characterized by the previous 
system of two equations. 

We note that our finite-rank perturbation arguments (see the introduction 
to Section 3) allow p and E(r) to be arbitrary. However, in the proof, we 
can, and will, assume (without loss of generality) that = and E(r) = 0. 

In the proof, we do not actually need the Aj's to be independent of each 
other. We only need them to be independent of the r's and their empirical 
distribution to converge a.s. to a deterministic limit, v. In the case of i.i.d. 
Aj's, we note that has an almost sure limit u by the Glivenko-Cantelli 
theorem ([48], Theorem 19.1) for triangular arrays. (A simple modification 
to the proof given in [48], which is not for triangular arrays, can be obtained 
using Hoeffding's inequality for the variables lAi<t) which guarantees that 
the result is true for triangular arrays.) 

We note that, perhaps interestingly, the proof could be adapted to show 
that quantities of the type trace(r'^(i?„ — zldp)~^)/d satisfy the same equa- 
tion as 1/7, with r raised to the power k at the numerator and the same 
denominator involving w, provided the HdS have enough moments. (Note 
that this is the case for m, with /c = and w which basically corresponds to 
A; = l.) 
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To make the theorem more concrete, we now give a few examples of 
distributions to which it can be apphed. The concentration justifications 
appear in Section 3.2. 

• Elhptical distributions. In this case, = y/pri, with fj uniformly dis- 
tributed on the sphere, so S = Idp. Note, in particular, that Aj can have a 
Cauchy distribution or any heavy-tailed distribution. The theorem hence 
describes the limiting measure obtained when using data sampled accord- 
ing to the multivariate t or Cauchy distributions. 

• Data distributed according to a Gaussian copula, with corresponding 
correlation matrix, R, bounded in operator norm. In this case, Aj = 1, 
Tj = ^{fi), where fj = J\f{0,R), and $ is the c.d.f. of the standard normal 
distribution. The theorem then says that the Marcenko-Pastur equation 
holds when is sampled according to this distribution. This example, in 
particular, appears to be out of reach of methods relying in one way or 
another on moment computations. 

• "Ti =p^/^fi, where fj is sampled uniformly from the unit £(,-ball or sphere, 
1 < 6 < 2, in ]RP. We refer the reader to [33] pages 37-38 for some of the 
subtleties which arise for the sphere when 1 < 6 < 2. 

• Vi has i.i.d. entries with finite second moment. Then, using the truncation 
arguments in [45], we see that we can truncate rj at level log(p) without 
a.s. affecting the limiting spectral distribution. (The arguments in [45] are 
rank arguments and carry directly over to our situation.) We then have 
c{p) = (log(p))~^ when using Lemma 7. Here, the convexity assumption 
mentioned in Lemma 7 is necessary, as we rely crucially on Corollary 4.10 
in [33] for the concentration arguments. 

• We note that if A, = 1 and d = p, then we recover the Marcenko-Pastur 
equation. The theorem therefore provides an extension of the known range 
of validity of this result. (We note that our result is in that case related 
to [39].) In this setting, the practical geometric remark made at the end 
of Section 3.2 applies and, hence, one should probably perform simple 
graphical diagnostics on the data before relying on insights drawn from 
random matrix results. 

The system of equations we have found is unfortunately not trivial to exploit 
in order to gain further understanding of the spectra of the matrices at stake; 
we postpone a detailed investigation of its consequences to a further project. 
We now turn to the proof of Theorem 2. 

3.3.1. Preliminaries. We note that the matrix we are considering is of 
the form FX' DXT' , where Z) is a diagonal matrix containing the Xf's, that 
is, Dij = lj=j A?. We let Si denote the eigenvalues of 5 = TV' . 

We denote by ||F|| the value sup2,|F(2;)| and by the c.d.f. of the 
spectral distribution of the matrix M. We see, using Lemma 2.5 in [45], 
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that 

\\pQ*TiQ _ pQ*f^Qn < l(rank(ri - fi) +2rank(Q - Q)). 
P 

In our situation, we have Q = XT' and Ti = D, so, using the fact that 
rank(Ai?) < min(rank(A),rank(i?)), we conclude that 

npQ'TjQ _ pQ'fjQn ^ l(i.ank(D - D) + 2rank(r - T')). 
P 

Let us now choose for D the diagonal matrix with entries Xflx2^ , which 

we abbreviate by Dl^^^^^^, and let T' = r'l|_5|<^^ (this is understood using 
the singular value decomposition of T' , where we keep the singular values 
that are less than and replace the others by 0). 

We see that rank(I? — D) = ^27=1 ^x'^>ap ^■^d, similarly, < rank(r' — 

r') < X^iLi l|si|>/3p- Since we assumed that Gd converges weakly a.s. to G 
and Vn converges weakly a.s. to i^, we conclude that for ap = Pp = logp, 
rank(r' — r')/p— > a.s. and rank(L' — D)/p ^0 a.s. Here, it is important 
that d/p and p/n have finite nonzero limits. 

So, to prove the theorem, it is sufficient to prove it for D and S bounded in 
operator norm by, for instance, logp since we just showed that by truncating 
S and D at these levels, we will not change the limiting spectral distribution 
of the matrices of interest, provided it exists. 

3.3.2. Proof of Theorem 2. As explained in Section 3.3.1, we can, and 
do, assume that all of the eigenvalues of 5 = W are less than logp and, 
similarly, we assume that |Aj| < \/Iogp since, as we have explained, these 
assumptions do not affect the limiting spectral distribution of We also 
recall that we assume that |||S|||2 < log(p). We call the spectral measures 
obtained after truncation G^, and to keep track of the modifications 
we have induced by truncation. However, to avoid cumbersome notation, we 
use 5", T and V to refer to the matrices we deal with. {S and T might have 
been more appropriate, but the notation would be too heavy.) The approach 
we use follows the "rank one perturbation" approach developed in [45] and 
[44]. 

We remind the reader that, as clearly explained in, for instance, [22], 
one can show vague convergence of distributions by showing pointwise con- 
vergence of Stieltjes transforms. This is the approach we take and we will 
therefore show convergence at fixed z of the Stieltjes transforms of interest. 
Finally, we will need a tightness result to go from vague to weak convergence. 
We now turn to the actual proof. 

Recall that uu = ^/On/nXkTrk, On = d/p and Bn = Y17=i Uiu[. We define 
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and 



1 -A 



n 1 + n. 



k=l 



MkUk 



We note that Bn is d x d, as are all of the other matrices involved here. 
Using the first resolvent identity — B~^ = A~^{B — A)B~^ and the fact 
that (see [44]) 

(2) B.iB„ - .Id.)- =Id.+.(B„ - M,)-^ = ±^^t^^, 



k=l 



we have 

{p{z)T - zldd)-' - {Bn - zldar' 



{Piz)T - zldd)-' 



and, hence, 

{p{z)T-zlddy' -{Bn-zldd)-^ 

n . 

=y — - — 

1 + u'l^MkUk 



{Piz)T-zldd) ^Uku'f^Mk 



n 



-xumT -ziddr^T{Bn -zidd)-^ 

Taking traces and dividing by d, we get 
dHdir) 



mn{z) 



(3) 



(5{z)t-z 

I n 1 r 

= ;;E i^ /M n',Mkmz)T - zldd)-'uk 



k=l 



— A|trace((/3(z)T - zldd)~^TM, 



n 



Now, using, for instance, equation (2.3) in [44], we easily obtain 



1 



1 + u'^MkUk 



On the other hand, it is clear that Im[/3(z)] < 0. As a matter of fact, the 
eigenvalues of all have positive imaginary part [if z = u + iv, they are 
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l/(Aj(i?(fc)) — u — iv)]. Note, also, that |||Mfc|||2 < 1/v. According to our first 
remark, the imaginary part of 1 + u'^Mj^Uk is positive and the imaginary part 
of 1/(1 + u'^jMfcUfc) is negative. Hence, the imaginary part of the eigenvalues 
of (3{z)T — zldd is smaller than —v (T is positive semidefinite) and their 
modulus is greater than v. Therefore, 

|||Re[(/3(z)r-zIdd)"^]|||2<i and |||Im[(/3(z)r - zldd)-']|||2 < ^. 

Now, I3{z) depends on all the u^'s in a nontrivial way, so we cannot apply 
our concentration results directly. Also, recall that T is positive semidefinite, 
so we can write T = X]f=i Ti^i^'i, with Tj > 0. So, if b{z) is another complex 
number, we have 

1 1 ^ Ti{b{z) - P{z)) , 

and 

r™[(/3(z)r - zidd)-' - {h{z)T - zidrf)- V' 

Therefore, if b{z) is such that \(3{z) — b{z)\ < e and Im[&(z)] < 0, we have. 



(4) \\mz)T -zid^r' - {b{z)T - zid^r'wi^ < 



e\\\T\\ 



2 



y2 



(5) \u'i,Mk{P{z)T - zldd) - u'^Mk{b{z)T - zlddy^Uk\ < ^e|||T|||2||nfcii2 



and 
(6) 



i trace(r'Mfc[(/3(z)T - zld^)^^ - {b{z)T - zldd)"^]) 
a 



4|||r|||'+^e 



by decomposing the matrices appearing in the trace into real and imaginary 
parts, which are both symmetric in this instance, and using a well-known 
result (see, e.g. [2], Theorem A. 4. 7) on bounds of the trace of a product of 
symmetric matrices. 
Consider 

»^ \2 1 

Since T is positive semidefinite, it is clear that Im[6„(z)] < 0. Our aim in the 
next few lines is to show that \bn{z) — I3{z)\ is small. Recall that, according 
to Lemma 2.6 in [45], we have, for any Hermitian matrix A, 

|trace((Mfc-7W„)A)| < 
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Applying Corollary 4, page 27, to and the random matrices T'MiV, 
whose norms are bounded hy K = log{p)/v, we see that, for any fixed e > 
and (5 > 0, 



max 

i 



^r'iP'MiTn-E{ni{z)) 



(log (3/2+1/6+5) ^ 

< e , , „ = e7p a.s. 



When this happens, we have, if we let ak = r'fS' MkTrk/d = u'l^M^Uk / {in^\) 
and a = E(r2i(2;)), 



\l3{z)-bn{z)\ 



\2 



fc=i 



|(l + en,A2afc)(l + enA2a) 



l^^k- 



k=l 



So, finally, since |Afc| < ^log(p), we have, for C{z) independent of p, 

\P{z) - < C{z)e^^^^^l^ a.s. 

Therefore, since |||T|||2 < (logp)^, using equation (4), we have 



(3{z)t - Z hn{z)T - Z 



^ £ ^ 2^ 



k=l 



<e 



(logp) — > a.s. 



Similarly, using our concentration bounds from Corollary 4 applied to 
u'f,Mk{hn{z)T - zldd)~\k/\l - ^ trace((6„(z)r - zld^)"^™, 
we see that a.s. 



n 



max 

i<fc<p 



< 



u'kMk{hn{z)T - Zldd)-\k/\l- ^tTS.Ce{{hn{z)T - zlddr^TMn) 



n 



and therefore 
|A(6.(z))| ^ 

(7) 



1 " 1 



u'f,Mk{bn{z)T - zldd) ^Uk 
-—\liTace{{bn{z)T - zlddY^TMr 



n 



I I A n 

<^->|^|Ea1-0 a.s. 



k=l 
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We now need to show that |A(/3(z))| tends to ahuost surely. To do so, we 
study \A{f3{z)) — A{bn{z))\. Using equation (6), we have 



1 



y - 

d ^1 1 + u'f^MkUk 



— tiace((p(z)T - zlddr^TMn) 
n 



— trace((bJz)T - zlddy^TMr 
n 



On the other hand, using equation (5), we find that 

1 ^ 1 

MkUk 



1 E , /\. KMk{hn{z)T -zlddT^Uk - u'kMMz)T -zlddY^Uk] 



< 



\hn{z)-0{z)\\z\ 



n 



k=l 



P 



Applying Lemma 8, with the matrix M = T'T, whose operator norm is 
bounded by log(p), we get, as above, that 



max 

k 



— r^r'Fr^ trace(r) 



< e 



(logp)3/2+l/6+5 



a.s. 



Because |||T'|||2 < (logp)^, we conclude that a.s. 



max 

k 



< 3(logp)= 



Therefore, 



\bniz)-l3{z)\\z\ 



2-EAfc-^fcr'rrfc^O a.s. 



k=l p 

Hence, 

\A{P{z))\ < \A{P{z)) - A{bn{z))\ + \A{bn{z))\ ^ a.s. 
Since, by equation (3), 

dHdir) 



A{(5{z)) 
we can finally conclude that 



P{z)t-z 



rriniz), 
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This corresponds to the first part of the theorem. Now, note that Im[6„(z)] < 
and therefore |1/(6„(z)t — z)\<\/v. Because / \ dHii{T) — dH(i{T)\ — > 0, we 
conclude that 

-r—r\ m„(z)^0 a.s. 

To get to the second part of the theorem, we instead consider 
T{(3{z)T - zUd)-^ - T{Bn - z\dd)-\ 
Taking traces and dividing by d, we get 

/ - ^trace(r(i?„ - zld,)"!). 

J Tp{z) — z a 

To control this quantity, we can use the same expansions we used before, 
everywhere replacing {(3{z)T — zldd)~^ by T{P{z)T — zldd)^^ ■ This has the 
effect of multiplying the upper bounds by |||T|||2, which, under our assump- 
tions, is bounded by (logp)^. So, we conclude that 



/ — — ni{z)^0 a.s. 

J Tbn{z) - Z 



Now, the result we obtained using Azuma's inequality shows clearly (see 
Remark 1) that 

-E(17i(z)) ^0 a.s. 
Letting Wn{z) = 'E{0,i{z)), we have shown that 

7 ,:,f'^1,, w.iz)^0 a.s. and 

J TjenX^dl^n{\)/{l+CnX^Wn{z))-Z ^ 
I rJe^X^ dUn{\)/{l+ CnX'Wniz)) - Z ~ """^^^ ^ " 

(8) 

• Subsequence argument to reach the conclusion of Theorem 2 
We now need to turn to technical arguments to get from the state- 
ment of equation 8 to that of Theorem 2. Because of our assumption that 
/ rdHdir) < K for all d (or p, which is equivalent), with K fixed and in- 
dependent of d, we see that |tt;„(z)| < trace (r)/((i?;) < K/v. So, at fixed z, 
Wn{z) is bounded. From this sequence, let us extract a convergent subse- 
quence t(;„(„)(z), or w„i for short, that converges to w. Through tightness 
arguments (see below), we see that w G C"*". We will now show that w{z) 
satisfies 

TdH{T) 



TfeX^dv{X)/{l+iX'^w{z)) 



w{z)=Q 
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and that there is a unique solution to this equation in C'^. Let bm{z) = J 
X^di>m{^)/{l + ^rn>^^Wm{z)). We first show that hm.^b = J 9\^dv{\)/{l + 
i\^w{z)). Todo so, note that \^ / {l + Wm\^)- \^ / {l + w\^) = {w-Wm)\'^ + 
u)A^)(H-u)mA^)]. Now, because Wm w £ C"*", their imaginary parts are uni- 
formly bounded below by 6, from which we conclude that, if Wm w £ C+, 



l+Wm>? J 1 + WA2 

On the other hand, for w G C~^, A^/(l + w}?) is a bounded continuous func- 
tion of A. Since Vm =^ and, therefore, ^ '^i we conclude that 

A^ dT>m. f A^ dv 



l + aA2 7 l + aA2' 

Therefore, since 9m 9, bmiz) — > b{z). Because we have assumed that 7^ 0, 
we have b{z) S C~. By essentially the same arguments, using the fact that 
I Im[6m('2)]| is bounded below by 6 and b{z) £ C^, we conclude that 



In other words, 



where 



T'bm(n)iz)-Z J Tb{z) - Z 
Tb{Z) — z 

9\^ du{X) 



0. 



b{z) 



l + CX'^w{z)' 

Similarly, we can show that along this subsequence, 

dHd{T) f dH{T) 



Tbm{z) — Z J Tb{z) — Z 

and so we also get the first equation in Theorem 2. 
• Uniqueness of possible limit 

We now prove that there is a unique solution in C"^ to the equation 
characterizing w, the only question remaining to tackle being uniqueness. 
To do so, we employ an argument similar to that given in [45], although the 
details are slightly different. 

Suppose we have two solutions in C"*" to the equation characterizing ■w{z). 
Let us call them wi and W2, where bi and 62 are the corresponding 6's. We 
have 
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r2 



{b2-bi) — — -dH{T) 

[Tbl - Z){Tb2 - z) 

a[wi — W2 



dH{T). 



r2 



(r6i - z){Tb2 - z) 

Let us call / the quantity multiplying wi — W2 in the previous equation. 
We want to show that |/| < 1. As in [45], using Holder's inequality, we have, 
given that ^ > 0, 



\l+CX'^W2iz)\'^ J \Tb2-z\'^ 

Let us write wi = a + ic, z = u + iv and bi = a — i'j. By writing the definition 
of bi in terms of wi , we see immediately that 

so / dz^(A) = — Im[6i]/Im[ii;i]. Since 7^ by our assumptions, we 

see that 7 > 0. On the other hand, using the definition of wi in terms of bi, 
we see that 

lm{iui]= /-Im[6i] , .. dH{T)+Im[z] f , ^ ^ ,„ dg(r) 

J |roi — J |roi — 

2 

and, therefore, Im['u;i] > — Im[6i] / dH{T) since if / 0. 

Hence, 

J \l + ^X^wi{z)\^ J |r6i -z|2 7 

and I /I < 1. We conclude that W2 = wi, so there is at most one solution to 
the equation characterizing w. 

• Tightness of and consequences for w 

Finally, we need to show that the spectral distribution F^" is tight a.s. 
and deduce consequences for w. It is shown (via Lemma 2.3) in [45] that 

if Bn = Tn'^Y*YnTn'^ ■, if the spectral distributions of the T^'s form a tight 
sequence and so do the spectral distributions of the Y*YnS, then i^^" form 
a tight sequence. We note that in our case Bn = TR^D'^RnT' /n, which, 
up to a number of zeros, has the same eigenvalues as S^/'^R^D'^RnS^^'^ /n; 
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we temporarily denote by i?„ the matrix containing our vectors rj. So, all 
we have to show is that F^n^n^"/" forms a tight sequence. Note that our 
assumption on the convergence of the spectral distribution of the A's implies 
that the spectral distributions of the -D^'s form a tight sequence. So, all we 
need do in order to conclude is to show that F^nRn/n ^jg^ forms a tight 
sequence. But we showed this in Corollary 5. So, F^" forms a tight sequence, 
a.s. Recall that when trace(S)/p is uniformly bounded by K, we showed 
in Corollary 5 that a.s. F^*^^/''{[M,oo)) < {K + 1)/M. So, for any e, we 
can find such that F^"[M£,oo) < e, a.s. Using the second inequality in 
Lemma 2.3 in [45] and the fact that H and u are deterministic, as well as the 
fact that if X and C is a closed set, limsupP(X„ G C) < P{X E C), 

we see that Mg can be chosen uniformly in to. 

We now want to show that w G C"^; to do so, we will show that a.s., 
Im[t(;„] is bounded away from zero. Note that Im[(i?„ — zld)~^] is a sym- 
metric matrix. Its eigenvalues, which we denote by a^, are, if 1^ denote the 
eigenvalues of — u)'^ + v'^) > v/C^i^l + + ^^)- Assume that 

ffli ^ ^2 ^ • ■ • ^ CLd- Using Theorem A. 4. 7 in [2], we see that, if we denote by 
Ti the decreasingly ordered eigenvalues of T, 



Im[Oi(z)] = Im 



1 ] 1 

-trace(r(B„ - zldp)"^) > -^rja^.j+i. 

i=l 



Now, all we need to show is that a.s. a fixed nonzero proportion of TiUd-i stay 
bounded away from 0. Because H ^0, we can find rj such that H{r], oo) > e 
for some e > 0. Let us choose such an e 7^ 0. In particular, the proportion 
of indices for which Ti > rj is a.s. greater than e because liminf i?(;(r7, 00) > 
H{r], 00), a.s. For this e, we can find < 00 such that F^" [0, rrie] > 1 — e/2 
a.s., from our arguments above. So, the proportion of z's such that Ud-i+i > 
v/{2{m1 + v?) + v"^) is greater than 1 — e/2. So, the proportion of z's for 
which both Ti> rj and fld-i+i > v/{2{m1 + v?) + v"^) must be greater than 
e/2, a.s. Hence, Im[r2i(z)] > 5 > 0, a.s. Now, we saw that Wn{z) = E(r2i(z)) 
is such that Wn{z) — ^i{z) — > a.s. Hence, Im['u;„(2:)] > 5 > a.s. and we 
can conclude that w is in C"*". So, the subsequence argument given above 
is valid and we can go from the result of equation (8) to the main result of 
Theorem 2. 

So, using the connection between pointwise convergence of Stieltjes trans- 
forms (see [22]) and vague convergence, we have shown that the spectral dis- 
tribution of Bn converges vaguely a.s. to a nonrandom distribution, which 
is uniquely characterized by the system of equations described in Theorem 
2. The a.s. tightness result we obtained for F^" ensures that the limiting 
spectral distribution of Bn is a probability measure and, hence, we have a.s. 
weak convergence, as announced in Theorem 2. This completes the proof. 
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4. Conclusion. We have shown that the concentration of measure phe- 
nomenon can be seen as an essential tool in the understanding of the be- 
havior of the limiting spectral distributions of a number of random matrix 
models. 

Motivated by applications, we have used one aspect of this phenomenon 
to deduce spectral properties of sample correlation matrices from the corre- 
sponding properties for sample covariance matrices. On the other hand, for 
more complicated models, we have generalized known results about random 
covariance- type matrices to sample covariance matrices computed from el- 
liptically distributed data, a type of assumption that is popular in financial 
modeling and, further, to generalized elliptically distributed data. We have 
done this almost entirely from concentration properties of certain quadratic 
forms. An interesting aspect of the proof is that it leads to new results for 
data coming from distributions for which the dependence between entries of 
the data vector cannot be broken up in a linear fashion. The concentration 
approach also highlights the fact that data vectors coming from a distribu- 
tion having the dimension-free concentration property we used repeatedly 
have, after proper normalization, almost the same norm and are almost or- 
thogonal to one another (in concrete terms, this remark applies to models 
considered in Theorems 1 and 2 when Aj = 1). Since this peculiar geometric 
feature may not be present in data sets to be analyzed, practitioners should 
probably perform corresponding diagnostic checks before relying on random 
matrix results of the type discussed in this and other papers. 

Interestingly, in all of the models considered, the results tell us that only 
the covariance or the correlation between the entries of the data vector 
matters and the more complicated dependence structure is irrelevant as far 
as limiting distributions of eigenvalues are concerned. 

APPENDIX 

On the covariance matrix of data distributed according to a Gaussian 

copula. The concentration results we developed in Section 3.2 require the 
covariance matrices of the data at stake to be bounded in operator norm 
by log(p). To be able to apply the results to data distributed according to a 
Gaussian copula, we therefore need to show that this is the case. We have 
the following fact. 

Fact 2. Suppose r is distributed according to a Gaussian copula with 
corresponding correlation matrix R. Then, if S is the covariance matrix of 
r, we have 

|||S|||2<;^(|||i?|||2/2 + 4|||i?|i(^/6-l/2)). 
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Proof. Recall that if r is distributed according to a Gaussian copula 
with corresponding correlation matrix R, r can be generated in the following 
way: draw v according to a multivariate normal J\f{0,R). Because i? is a 
correlation matrix, Vi, the ith entry of v is M{0, 1). Now, calling <I> the c.d.f. 
of the standard normal distribution, = 

We also recall the standard fact (see [37], Definition 5.28, Proposition 5.29 
and Theorem 5.36) that 

=cov{ri,rj) = ^ arcsin(it'jj/2). 
Note that \Rij\ < 1- Recall the series expansion of arcsin(2;), valid for x <1: 



arcsm 



(-) = E 



2n+l 



with Ur. 



(2n)! 



n=0 



4"(n!)2(2n + l)' 



Denote by £^ o M the Hadamard (i.e. entrywise) product of matrices E and 
M . In [15], it is shown that if M has nonnegative entries, and M and E are 
symmetric, then 



|£;oM|||2 < maxd^Ji,, 



2- 



K 


o2n-l 


K 


o2 


.2". 


o 


.2". 





Call g{R/2) the matrix with entries arcsin(i?.jj/2). Then 
g{R/2) = I + E ^ 

^ n=l 

where is the nth Hadamard power (i.e. entrywise) of matrix A. Now 
maxjj \Rij/2\ < 1/2, so 

max|(i?,,/2)2-i|<^. 
Now, using e.g. [7], Problem 1.6.13, we have 



We therefore have 
/R 



9 



\ 2 



< 



< 



I-RII 



+ E 



2 ' ""22n-l 
n=l 

III Pill °° 

<^ + 4|||i?|||2^n„ 



2 

l^llb 



n=l 



22n+l 



+ 4|||i?|||^(arcsin(l/2) - 1/2). 



The result follows since arcsin(l/2) =7r/6. □ 
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